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ABSTRACT 

The relative power of the Mann-Hhitney statistic, the 
t-statistic, the median test, a test based on exceedances (A,B), and 
two special cases of (A,B) the Tukey quick test and the revised Tukey 
quick test, was investigated via a Monte Carlo experiment* These 
procedures were compared across four population probabilxty models: 
uniforv:, beta, normal, and double exponential* Sample sizes of (5«5) , 
<10,10), (20,20), (5,10), and (5*20) were among those used« Results 
indicate the median test should be considered for distributions which 
certain outliers«_. The exceedances tests can be powerful alternatives 
to more standard procedures if the underlying distributions are 
platykurtic. (Author) 
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INTnODOCriON 

In the last few years, a great deal of information has been publish- 
ed regarding the robustness of the t-statistic and other noimal distribution 
theory hypothesis testing procedures. In general, these procedures are 
ronarkedly robust when the underlying assunptions are violated, especially 
with respect to control over errors of the first type. Exceptions occur 
idien both the variances and sample sizes are unequal and under SOTie con- 
ditions of rather extreme non-nonnality, primarily skewness. A very 
conqprehaisive review of the research on the robustness of the Student- 
procedure is reported by Hatch and Post en (1966). 

While a great deal of research has been conducted on the robustness 
of the t-statistic, and a few of its distribution free con5)etitors, this 
research has tended to focus on a rather narrow definition o£ robustness; 
i.e. , the control over Type I errors. Violation of the assunptions neces- 
sary for the exactness of any, hypothesis testing procedure also affects 
its control over Type II errors. Conditions of non-normality and variance 
heterogaieity, \Adle not always detrimental to the performance of the 
t-statistics control over the nominal significance level, sometimes have 
a very noticeable effect on the t-tests power, especially relative to 
other hypothesis testing procedures, PratocMiiraj (1970). 
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Purpose of the Study 

As is well known y many of the distributions >^ich exist in educational 
and psychological research are non-normal in nature. Many carefully con- 
structed standardized tests yield raw score distributions yihich are by 
necessity bounded and relatively flat or platykurtic in nature. Some of 
these distributions are, in fact, nearly rectangular or unifoim, Brandenburg 
(1972)* Another, contrasting, situation is one in yAiioh an occasional 
large measurement error will produce a highly disparate observation. This 
tends to create underlying distributions lAich are leptokurtic (peaked) . 

The major bbjective of this study was to investigate the relative 
power of four two-sanple hypotheses testing procedures across four different 
underlying distributions for five variations of sanple size. The four 
statistical procedures investigated were (1) the t-test (tj , (2) the 
Mann-Whitney U test (U) , (3) the median test (r) , and (4) two variations 
of a test based on exceedances: a procedure described by Hajek (1969) 
vAich will be designated by (A,B) and a procedure reccmmended by Tukey (1959) 
referred to as (A+B). 

Description of Statistics Investigated and Probability Models Sampled 

In order to birpirically determine the relative Type I error control 
and power of the various hypotheses testing procedures, four probability 
distributions were used as^sanpling models. Each of these distributions 
was continuous and synmetric, but each differed primarily in tail weight 
or degree of kurtosis = K « E[(X - y)*]/o^- These distributions 
wre: 1) the double exponential, 2) the normal, 3) the uniform, and 
I 4) a lambda distribution (Ramberg and Schmeiser, 1972) with tail weight 

\ (K - 2«3) between the normal and uniform distributions. 
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Anxmg the ranlc tests the two-sanple median test is locally most 
powerful vSien the underlying distributions are double exponential. The 
double e}q)onential distribution is characterized by its long (heavy) 
tails (K = 6,0). The Msmn-Whitney U test is the uniformly most powerful 
rank test vAien the underlying distributions are logistic. The logistic 
distribution is somewhat lighter (or shorter) tailed (K = 4,2) t!ian the 
double exponential, but still heavier tailed than the normal probability 
model (K - 3,0), It is well known that the t-statistic is uniformly 
most powerful if the underlying distributions are noimal. The two tests 
based on exceedances (A,B) and (A+B) are each locally most powerful, for 
different alternatives, \dien the sample distributions are uniform (K = 1,8) • 

The test statistic for the (A,3) test used in testing H^: F(x) = F(y), 
against Hj^: F(x) > F(y), is the ordered pair (a,b) viiere a is the number 
of y*s greater than the largest x, and b^ is the number of x's less than 



the smallest y. The (A,B) test assumes that the pairs are ordered by the 



Then the pair (A,B) whose values (a,b) are ordered as above provides 



powerful for uniform distributions with small mean differences. 

The test statistic for the (A+B) test is a + b where a and b are the 
same as for the (A,B) test. This prxedure is locally most powerfiil for 
unifom distributions with "large" mean differences. 



following rule: 

(A,B) > (A^B') if 




(either min(A,B) > min(A',B») 





a one ended test of H^: F(x) = F(y). The (A,B) test is locally most 
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Graphs and density functions of the probability distributions 
san5)led are given in Figure 1. Since these graphs do not clearly show 
the distinction in tail weights of the distributions, each distribution 
was rescaled to have the same median and .95 quantile as the standardized 
normal. This conparison of the tails of the four distributions is shown 
in Figure 2. 

Procedures 

The procedure used to generate the empirical sampling distributions 
of the hypothesis testing procedures investigated is described in the 
following steps: 



n from a Y-uni verse with mean y„ and variance a . 

y y 

Each of the m + n elements in the vector was ob- 
tained by generating a uniform random number 
between zero and one, vfcich was regarded as a re- 
lative ctmnilative frequency of the unifoim dis- 
tribution. The random variable for each of the 
other distributions investigated (lambda, normal, 
double e:q)onential) was then obtained through what 
amounted to an area transformation. 

2. Five combinations of san5)le size (m,n)[(5,5); 
(10,10); (20,20); (5,10); (5,20)] and five values 
of A (0(1)4) were selected for investigation for 
each of the four probability models. 



3. For each vector of m + n observations, the sta- 
tistics t, U, r, (a,b), and (a+b) were confuted. 
This procedure was repeated 1000 times for each 
combination of (m,n), population distribution, 
and A -value. 



Vectors of m + n elements^ randomly drawn from 
each of the four population distributions, were 
obtained. The first m elements from an X-universe 
having mean variance and the remaining 
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4. For each of the above replications the test 
statistics were referred to their respective 
• 05 two-ended critical values. Since critical 
values corresponding to a significance level 
of exactly .05 do not ordinarily exist for the 
rank type procedures, a randomization process 
was used vfaich insured that each would have a 
nominal level of .05. 

Results 

The enpirical Type I error and power values (times 1000) sObtained 
for this investigation are presented in Table 1. In general, these re- 
sults are consistent with predictions obtained from asynptotic theory. 
In the discussion which follows the various hypotheses testing procedures 
will be conpared for the different population models sanpled. Of the 
two tests based on exceedances all references will be to (A+B) . Little 
practical difference existed between (A+B) and (A,B) and because of the 
sinpler decision rule associated with A + B it seems to be the preferable 
procedure. 

The results may be summarized by sanpled distribution as follows: 

1. Double exponential. Across the various san5)le 
sizes studied both t and U exhibit excellent 
power. There appears to be little reason to pre- 
fer either of these procedures although U was slight- 
ly more powerful for the larger equal sized sanples. 
The most surprising result for this population model 
was the very poor performance of the median test (r) . 
While this procedure is the locally most powerful 

(A small) of the rank teslj for double e^qponential 
distributions, the only case in which it was in any 
way conqparable to t and U was i^en m = n - 20. 
Considering the manner in which the (A+B) procedure 
is defined it performed surprisingly well except 
for m - n = 20. 

2. Normal. As was expected, t was the superior pro- 
cedure for this case. However, as is well known, 
the Mann -Whitney statistic performs very well vAien 
the underlying distributions are normal. Once again 
(r) was inferior to (A+B) except i^en m « n 20. 
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3, Lambda. For this relatively flat population 
model (K = 2.3), t was superior to all statistics 
investigated. There appears to be little reason 
to prefer either U or (A+B) and both would seem 
to be reasonable alternatives to the t -statistic. 
The median test is noticeably less powerful 

than any other across all sanple size combinations. 

4. Uniform. (A+B) and t are the preferable pro- 
cedures for distributions of this type. The 
t-statistic is slightly more powerful than 
(A+B) in the m = n = 5 aise and there appears 
to be little difference between the two methods 
for m n. For the larger equal sanple sizes 
(A+B) is the superior method, markedly so in 

m - n = 20 case. Although less powerful than 
t and (A+B) , the U statistic performs reasonably 
well for rectangular distribution types. This 
is especially true relative to r vdiich is 
markedly inferior to all procedures. 

Selected results from Table 1 discussed above are illustrated in Figures 

3 through 7. 

In summary, it appears that t is probably overall the si;5)erior sta- 
tistic although for "heavy*' tailed distributions U is a very con5)etitive 
alternative and for ^'lighter" tailed underlying densities the tests based 
on exceedances are attractive alternatives, especially (A+B) because of its 
siji5)licity. With rhe exception of large sain)les frcan leptokurtic population 
models the median test has little to offer relative to the other procedures 
investigated. 
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(inverse of the CDF) 
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.35 



q-y) 



.35 



.385 



Where p is uniform 0 5 p <_ 1 
|xj < 2.6 
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1, 0 < X < 1 
0, otherwise 



FIGURE 1 

PRDBABILIIY DISTRIBOTIONS SAMPLED 
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FIGURE 2 ; 
UPPER 5% TAILii OF DISTRIBUTIONS SAhPLED t 
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PIGURJS 3 



Eaplrlcal Power Values «nd Smoothed Power Curvsn 
for t, U, r» and A+B for Double Exponential Distributions 

in • n » 10, o » .OS 
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rcf(*) 




— L!] — li .!!r".J!'!.l.]'i!l — L— i ' ■ '■ — ' — * — ' o.i 

0.0 O.S 1.0 l.S 2.0 2.5 5.0 J.5 4.0 4.S 



ERIC 



i 
i 



FIGURE 4 



Enpirlcal Power V^ilues and Stsoothed Power Curves 
fc- t, r> tihi A+E for LmI^* 4ilstributlcnn 
?• • n • 20t •OS 
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Empirical Power Values and Smoothed Power Curves 
far t, r, aad A+B for Unlfora DisCribucions 
m ■ n " 20» a ■ .05 
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FIGURE 6 



Empirical Puwc- Values and Snooi'.icd Power Cut.vts 
for t, U» r» and A+B for Unitom DisCributions 
« • 20, n •» 5, (*» .01 
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